本文提出了一种新颖的地理跟踪方法,即通过在室外环境中进行连续的度量自我定位,通过注册车辆的传感器信息,以看不见的目标区域的空中图像。地理跟踪方法为取代全球导航卫星系统(GNSS)的嘈杂信号提供了潜力,并且昂贵且难以维护通常用于此目的的先前地图。所提出的地理跟踪方法将来自板载摄像机和LiDAR传感器的数据与地理注册的正射击对准,以连续定位车辆。我们在公制学习环境中训练模型,以从地面和空中图像中提取视觉特征。地面特征通过激光雷达点投影到自上而下的视角,并与空中特征相匹配,以确定车辆和正射击之间的相对姿势。我们的方法是第一个在端到端可区分模型中使用板载摄像机在看不见的正射击上进行度量自定位。它表现出强烈的概括,对环境的变化是强大的,并且只需要地理姿势作为地面真理。我们在Kitti-360数据集上评估我们的方法,并达到平均绝对位置误差(APE)为0.94m。我们进一步与Kitti Odometry数据集的先前方法进行了比较,并在地理跟踪任务上实现了最新结果。
translated by 谷歌翻译
尽管当前的交互式视频对象细分方法(IVO)依靠基于涂鸦的交互来生成精确的对象掩码,但我们提出了一个基于点击的交互式视频对象细分(CIVOS)框架,以尽可能简化所需的用户工作负载。 CIVOS建立在反映用户互动和掩盖传播的DE耦合模块的基础上。交互模块将基于点击的交互转换为对象掩码,然后通过传播模块推断为其余帧。其他用户交互允许对象蒙版进行改进。该方法对流行的交互式〜戴维斯数据集进行了广泛的评估,但不可避免地适应了基于点击的基于点击的相互作用。我们考虑了在评估过程中生成点击的几种策略,以反映各种用户输入,并调整戴维斯性能指标以执行与硬件无关的比较。提出的CIVOS管道取得了竞争成果,尽管需要较低的用户工作量。
translated by 谷歌翻译
语义分割模型需要大量的手工标记培训数据,这是昂贵且耗时的生产。为此目的,我们提供了一种标签融合框架,其能够以无监督的方式改进视频序列的语义像素标签。我们利用环境的3D网格表示,并使不同帧的预测融合到使用语义网格纹理的一致表示。使用原始内在和外部摄像机参数渲染语义网格产生一组改进的语义分段图像。由于我们优化的CUDA实施,我们能够以不确定性意识的方式利用以C $课程超过$ C $ -dimensional概率分布。我们在Scannet DataSet上评估我们的方法,在那里我们从52.05美元到58.25美元的$ 52.05 \%$ 52.05 \%$ 58.25 \%$ 58.25 \%$ 58.25 \%$ 58.25 \%$ 58.25 \%$ 58.25 \%$ 58.25 \%$ 58.25 \%的注释。我们在线发布我们框架的源代码,以促进该区域的未来研究(\ url {https://github.com/fferflo/semantic-meshes})。据我们所知,这是基于具有语义纹理的网格的网格的第一个公开的标签融合框架。
translated by 谷歌翻译
基于深度学习的模型,例如经常性神经网络(RNNS),已经应用于各种序列学习任务,取得了巨大的成功。在此之后,这些模型越来越多地替换对象跟踪应用程序的经典方法,用于运动预测。一方面,这些模型可以通过所需的更少建模捕获复杂的对象动态,但另一方面,它们取决于参数调谐的大量训练数据。为此,我们介绍了一种用于在图像空间中产生无人机(UAV)的合成轨迹数据的方法。由于无人机,或者相反的四轮压力机是动态系统,它们不能遵循任意轨迹。通过UAV轨迹实现对应于高阶运动的最小变化的平滑度标准的先决条件,可以利用规划侵略性的四轮机会飞行的方法来通过一系列3D航点产生最佳轨迹。通过将这些机动轨迹投影,该轨迹适合于控制二次调节器,实现图像空间,实现了多功能轨迹数据集。为了证明合成轨迹数据的适用性,我们表明,基于RNN的预测模型,在生成的数据上训练,可以在真实的UAV跟踪数据集上优于经典的参考模型。评估是在公开的反UAV数据集完成的。
translated by 谷歌翻译
在诸如对象跟踪的应用中,时间序列数据不可避免地携带缺失的观察。在基于深度学习的模型的成功之后,对于各种序列学习任务,这些模型越来越替换对象跟踪应用中的经典方法,以推断对象的运动状态。虽然传统的跟踪方法可以处理缺失的观察,但默认情况下,大多数深度同行都不适合这一点。迄今为止,本文介绍了一种基于变压器的方法,用于在可变输入长度轨迹数据中处理缺失的观察。通过连续增加所需推理任务的复杂性,间接地形成模型。从再现无噪声轨迹开始,该模型然后学会从嘈杂的输入中推断出来的轨迹。通过提供缺失的令牌,二进制编码的缺失事件,该模型将学习进入缺少数据,并且Infers在其余输入上调整完整的轨迹。在连续缺失事件序列的情况下,该模型则用作纯预测模型。该方法的能力在反映原型对象跟踪方案的综合数据和实际数据上进行了证明。
translated by 谷歌翻译
在诸如跟踪之类的任务中,时间序列数据不可避免地携带缺失的观察。虽然传统的跟踪方法可以处理缺失的观测,但经常性的神经网络(RNNS)旨在在每一步中接收输入数据。此外,RNN的当前解决方案,例如省略缺失的数据或数据归档,不足以解释所产生的不确定性。迄今为止,本文介绍了一种基于RNN的方法,其提供了用于运动状态估计的完整时间过滤周期。卡尔曼滤波器启发方法,可以处理缺少的观察和异常值。为了提供完整的时间过滤周期,扩展了基本RNN以考虑其精度以考虑更新当前状态而采取观察和相关的信念。生成参数化分布以捕获预测状态的RNN预测模型与RNN更新模型组合,这依赖于预测模型输出和当前观察。通过提供具有屏蔽信息的模型,二进制编码的缺失事件,模型可以克服标准技术的限制来处理缺失的输入值。模型能力在反映了原型行人跟踪方案的合成数据上证明了模型能力。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
We present a dynamic path planning algorithm to navigate an amphibious rotor craft through a concave time-invariant obstacle field while attempting to minimize energy usage. We create a nonlinear quaternion state model that represents the rotor craft dynamics above and below the water. The 6 degree of freedom dynamics used within a layered architecture to generate motion paths for the vehicle to follow and the required control inputs. The rotor craft has a 3 dimensional map of its surroundings that is updated via limited range onboard sensor readings within the current medium (air or water). Path planning is done via PRM and D* Lite.
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io
translated by 谷歌翻译